Boosting Text Compression with Word-Based Statistical Encoding

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Text Compression with Word-Based Statistical Encoding

Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress natural language texts. With compression ratios around 30-35%, they allow fast direct searching of compressed text. In this article we reveal that these compressors have even more benefits. We show that most of the state-of-the-art compressors benefit from compressing not the original text, but t...

متن کامل

Word-based Text Compression

SUMMARY The development of efficient algorithms to support arithmetic coding has meant that powerful models of text can nowbeused for data compression. Here the implementation of models based on recognising and recording words is considered. Move tot he front and several variable order Markov models have been tested with a number of different data structures, and first the decisions that went i...

متن کامل

Word-Based Text Compression

Today there are many universal compression algorithms, but in most cases is for specific data better using specific algorithm JPEG for images, MPEG for movies, etc. For textual documents there are special methods based on PPM algorithm or methods with non-character access, e.g. word-based compression. In the past, several papers describing variants of wordbased compression using Huffman encodin...

متن کامل

Boosting Statistical Word Alignment

This paper proposes an approach to improve statistical word alignment with the boosting method. Applying boosting to word alignment must solve two problems. The first is how to build the reference set for the training data. We propose an approach to automatically build a pseudo reference set, which can avoid manual annotation of the training set. The second is how to calculate the error rate of...

متن کامل

Enhanced Word-Based Block-Sorting Text Compression

The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Computer Journal

سال: 2011

ISSN: 0010-4620,1460-2067

DOI: 10.1093/comjnl/bxr096